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How to Write a Research 






Announcement 


Paper (or How to 
Graduate Quickly)? 






• Over the summer, Prof. Mitra’s and Prof. 
Lee’s groups will have a joint DB seminar 




• •• 




• Goals: 


DB Group Summer Seminar 


• ••• 

• •••• 
• ••• 




< Forum for practice talks 
• Learn what others are working on 




• •• 




• Get fresh ideas from others’ works 


Dongwon Lee 


• • 




v Find collaborators for your research 


May 19, 2005 






• Get to know each other 
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Justification 

• I am probably a qualified person to give a talk 
on this topic... because 

. I’m still STRUGGLING to publish 

• I do get rejections a lot :-( 

• I’m still learning from failures 

• What’s being presented here is purely my 
suggestion 

• Take it or leave it - upto you I! 



What is the Goal of a 
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Research Paper? 


w 


• Disseminate your ideas to others so that 


people appreciate/use/cite them 




• Graduate... Of course 




c MS: need to write thesis to graduate... 




• Ph.D: “Publish or Perish” 




• Without good publications... 




• No good job, no good career 




i And possibly no good life either 




• GPA: nobody cares 




o Maintain about 3. 0/4.0 





Where to Start? 
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DB Conferences/Symposiums/Workshops (81) 

ADB, ADBIS, ADBT, ADC, ARTDB, Berkeley Workshop, BNCOD, CDB, CIDR, CIKM, 
CISM, CISMOD, COMAD, COODBSE, CoopIS, DAISD, DANTE, DASFAA, DaWaK, 
DBPL, DBSEC, DDB, DDW, DEXA, DIWeB, DMDW, DMKD, DNIS, DOLAP, DOOD, 
DPDS, DS, EDBT, EDS, EFIS/EFDBS, ER, EWDW, FODO, FolKS, FQAS, Future 
Databases, GIS, HPTS, IADT, ICDE, ICDM, ICDT, ICOD, IDA, IDC(W), IDEAL, 
IDEAS, IDS, IGIS, IWDM, IW-MMDBMS, JCDKB, KDD, KR, KRDB, LID, MDA/MDM, 
MFDBS, MLDM, MSS, NLDB, OODBS, OOIS, PAKDD, PKDD, PODS, RIDE, RIDS, 
RTDB, SBBD, SDM-SIAM, Semantics in Databases, SIGMOD, SSD, SSDBM, SWDB, 
TDB, TSDM, UIDIS, VDB, VLDB, WebDB, WIDM, WISE, XP, XSym 

DB Journals (19) 

ACM TODS, ACM TOIS, DKE, Data Base, DMKD, DPD, IEEE Data Eng. Bulletin, 
IEEE TKDE, Info. Processing and Management, Info. Processing Letters, Info. 
Sciences, Info. Systems, J. of Cooperative Info. Systems, J. of Database 
Management, JIIS, KAIS, SIGKDD Explorations, SIGMOD Record, VLDB J. 



The list excludes Information Retrieval and Digital Library 



Where to Start? 



Start from good ones: 

- DB: SIGMOD, VLDB, 

ICDE, EDBT, ... 

- DB Theory: PODS, ICDT, 

- Data Mining: KDD, ICDM, SDM:°°^ 

- Modeling: ER, ... 

- Information Retrieval: SIGIR, CIKM. . . . 

- Digital Library: JCDL, ECDL, CIKM, ... 

- Web: WWW, WebDB, ... 

Look at DBLP: http://www.infonnatik.uni-trier.de/~ley/db/ 
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Where to Start? 

• Don't be afraid to read journal papers 

• DB field is a fast-moving discipline: 

• Latest techniques appear in conference/workshop 

• More mature work appears in journal 

• Although longer than conference version, often 
easier to read 

• Lots of examples, figures, descriptions, ... 

• Examples: 

• ACM TODS, ACM TOIS, VLDB J., IEEE TKDE, ACM TOIT 
ACM Computing Survey, C. ACM, SIGMOD Record, ... 
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Symptoms 

• After chasing relevant works that are 
increasing super-exponentially fast, you 
would feel... 

• All relevant problems are ALREADY studied 
by someone else 

• Others have 1000+ history: Mathematics, Art, ... 

• Problem is too BROAD for me to tackle 

• Divide-n-conquer 



How to Find the DARN 
Research Problem? 

• Easy but non-helpful answer: 

• Read and think and read and think and... 

• Subjective but MAYBE-helpful answer 

• MAP approach 

• MATRIX approach 

• DELTA approach 

• DROP approach 



What I Call M2D2 
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1. MAP Approach 

• To start a research, initially, you have to read a lot of 
papers anyway 

• While reading those, why don’t you analyze and 
summarize what you’ve read and put them into your 
own wording? 

• Good for a survey paper - a MAP for future readers 

• To be publishable, your survey must have novel 
view-point, taxonomy, comprehensive analysis, or 
all of them 

• Good target: ACM Comp. Survey, SIGMOD Record, 
ACM C.ACM, IEEE Computer, ... 
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2. MATRIX Approach 

• Now, You have read a lot of papers 

• Draw a MATRIX on a specific problem, and 
map the paper that you read to cells of matrix 

• At the end, non-filled cell is the missing work 
that no one has done 

• But wait. . . first make sure that: 

• The hole is worthwhile to fill in 

• Doable (good as my dissertation topic?) 

• Value (what's good?) 
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Example: XML-Relational 
Conversion Problem 
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3. DELTA Approach 

• Arguably easiest... 

• Pick one paper of your interest 

• Read a lot - more than 10 times 

• Find limitations and Extend it by DELTA 

• Prove or demonstrate that 

• The limitation that you pointed out is valid 

• Your suggestion improved the problem by DELA 

• The more well-known work you choose, the harder 
to improve, but the better for your reputation... 

• Eg, “E.F. Codd’s relational model is insufficient to handle 
semi-structured model because...” 

• The bigger the DELTA is, the better your paper 
gets 



Example: The optimal 
wedding problem 




• When a person has a chance to date K 
persons, the optimal wedding algorithm is: 
Date upto K/3 persons 

Let the best person among K/3 as B using a criteria C 
Start dating again from K/3+1 person, p 
If p is better than B using C 
Stop and Marry p 

Otherwise, keep dating till K-th person 





• How many ways can we improve this algo? 



hnnStah 

Possible DELTAs 

• Parameters fitting: 

• How to determine K? Estimate? 

• How to determine C? Comparison? 

• Scalability? K=10 vs. K=1 00,00? Sub-optimal? 

• Question the assumptions: 

• Monogamy vs. Polygamy vs. N-gamy? (How to find n th best 
spouse fast?) 

• Data distribution? Uniform/Poisson/Scale-free 

• Application to another domain? 

• System building? 
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4. DROP Approach £ 


Which DELTA to Choose 




(adopted from J. Widom's slides) 


• Pick the DELTA that is the most significant 




• Pick a simple but fundamental assumption 


• Some criteria are: 




underlying traditional database systems 


• Have practical values 




. DROP it 


• Has motivational scenario as of NOW, or 




• Reconsider all aspects of data management 


• Predicted to be useful in N years 




and query processing 


• Non-trivial 




« Many Ph.D. theses 


• Hot topics: 




• Prototype from scratch 


• Streaming, XML, Sensor, ... 




From http://www-db.stanford.edu/~widom/stream.ppt 

DB Seminar Talk, 2005, Dongwon Lee 
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Example: Two Stanford 
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Projects 




Where to Submit? 


• The LORE Project 




• Top-down 


• Dropped assumption: 




• Aim at the best conference in the field 


“Data has a fixed schema declared in advance" 




• If rejected, go to next-tier conference or symposium 


e Semi-structured data (— > XML) 




• If rejected, go to next. . . 


• The STREAM Project 




• Bottom-up 






• Aim at workshop 


• Dropped assumption: 




• If accepted, work more and aim at better one (symposium 


“First load data, then index it, then run queries’ 




or 2 nd -tier conference) 


* Continuous data streams (+ continuous queries) 




• After making sure that the ideas mature enough, aim at the 






best conference 


From http://www-db.stanford.edu/-widom/stream.ppt 
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Avoid Some Notorious 




Facts on Paper Reviews 


Venues 




(adopted from J. Cho’s slides) 


• “Randomly generated paper got accepted to a 




• 3-4 reviewers per paper 


conference... MIT PranK’ (slashdot, 2005) 




• 10-20% acceptance rate for top-tier venues 


• http://pdos.csail.mit.edu/scigen/ 




© Eg, The World Multi-Conference on Systemics, 




• Very competitive 


Cybernetics and Informatics (SCI) 




• Criteria 


• Along your career, you will get emails from unknown 




• Accept/Weak Accept 


venues to submit a paper, to serve as PC, etc 




• Neutral 


© Be careful if the venue is not well-known 




• Weak Reject/Reject 


© Many of them are NON-REVIEWED, and Profit-Oriented 
event - no academic values what so ever !! 




• One reject kills a paper 


DB Seminar Talk, 2005, Dongwon Lee 




• At least Accept, Weak Accept and Neutral 



n wStah 




How to Give a Good 


About Reviewers 




Impression in 1-2 hours 


• 15-20 papers per reviewer 




1. Good introduction 


• Reviewer cannot spend 5-10 hours per paper 




• Everyone reads it 


• 20 X 1 0 = 200 hours = (40 hours X 5) = 5 weeks! 




• If not interesting, people stop reading 


© No reviewers can afford this 




2 . Easy to read 


• Give a good impression in 1-2 hours! 




i People should understand what you say 






2 . Easy to confuse, difficult to understand 


Content comes next! 




3 . Build an excitement and a strong case 






What is good? 






4. Broad reference 


WARNING: Of course, to start with, your main idea 




i Sometimes kills a paper 


must be good to get into top-tier... 




2 . Program committee members 


DB Seminar Talk, 2005, Dongwon Lee 
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Good Introduction 

1. What’s the problem? 

2. Why is it important? 

Mention some application, existing problems 

3. Why is it difficult? 

Ask some not-very-obvious questions or explain naive 
approach 

4. What others did? 

5. What’s my contribution? 

Contribution bullet list (paper organization) 
e. Build some excitement/surprise 

Keep reading! You will find something interesting later 
7. Every word should be carefully picked 
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How to Write an Intro 

1. Start with 5 bullets 

• What’s the problem? 

• Why is it interesting? 

• 

2 . 1-2 sentence answer to each question 

3. Add more content 

4 . Spend enough time on intro 

Bullet points enough 
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Easy-to-Read Paper 

• You can always make it complicate later 

1. Lots of examples 

2 . Figures & Tables - Figure speaks !! 

• Summary of notations 

3 . Define models/architecture precisely 

• Explicitly write down assumptions 

• Input, output, property, goal function 

4 . Make a connection 

• Why this experiment? 



Paper Organization (10 
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pages) 




Introduction (2 pages) 

2. Related Work (half page) 

3 Framework (2 pages) 

4 Main Ideas (3 pages) 

5 . Experiments (2 pages) 
e Conclusion (half page) 

? References (half page) 




• Actual idea - only 3 pages!!! 




t Even tiny idea can turn into a good paper if you 


DEVELOP well 





Importance of Personal 
Research Log 

• Maintain personal research log 

• Sketch your research ideas into a writing 

• Update your ideas as time passes 

• Occasionally go back to old writings 

• Prepare a short review for each paper that you read 

• Summary 

• Pros and cons 

• Limitations or problems 

• If needed, contact authors and ask questions 

• Usually authors are willing to discuss with their readers 
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Start Writing Early On... 

• Even if you feel you are NOT ready yet 

• Your advisor will throw away your initial draft 
anyway 

• Your initial submission will be rejected anyway 

• But you get 

• (good or bad) Experiences and learn from that 

• Writing sharpens your ideas and gives more ideas 

• Writing can be improved only via writing 
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Fabrication and Plagiarism 




dbworld 


• “Prominent Physicist Fired for Faking Data 
Research: Bell Labs says scientist 'recklessly' 
misrepresented work on microprocessors. . . ” (2002, 
LA Times) 

http://www.latimes.com/news/science/la-sci- 
physicist26sep26. story 

• “Constantinos V. Papadopoulos got caught 
plagiarism at EUROPAR ( 1 995). . . 7 papers 
published and 8 under submission. . . all plagiarized 
from Technical Reports. . . ” 

http://www.sics.se/europar95/plagiarism.html 




• Be a member of dbworld newsgroup 

- http://www.cs.wisc.edu/dbworld/ 

- Free membership 

- Keep track of DB-related news 


• NEVER, EVER, do these - professional suicide !! 
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